Atom AI Labs - AI-Powered Multi-Tenant Platform

ATOM SaaS Platform - Data Flow Architecture

Overview

The ATOM SaaS platform uses a **hybrid architecture** with both Next.js (TypeScript) and Python FastAPI backends, deployed across unified Cloud Service Nodes.

Cloud Service Architecture

1. Web Platform - Main Production App

**URL**: https://[tenant].atomagentos.com
**Components**:
Next.js frontend (port 3000) - 153+ API routes
Python FastAPI backend (port 8000) - runs in same node via docker-entrypoint.sh
**Purpose**: Primary web application with both frontend and backend
**Deployment**: Single Dockerfile with multi-process entrypoint

2. API Service - Python Backend API

**URL**: https://[tenant].atomagentos.com/api/v1
**Components**:
Python FastAPI only (port 8000)
Dockerfile.api (backend-only build)
**Purpose**: Dedicated Python backend for specialized services
**Auto-Scale**: Configurable node scaling based on demand

Code Distribution

Next.js API Routes (`src/app/api/`)

**153+ routes** covering most backend logic:

**Authentication**: /api/auth/* (signup, login, password reset, 2FA)
**Chat**: /api/chat - main agent chat interface
**Agents**: /api/v1/agents/* (comments, schedules, plans)
**Graduation**: /api/graduation/* (episodes, readiness, exams)
**Settings**: /api/settings/* (workspace, general settings)
**Admin**: /api/admin/* (health, tenants, users, governance)
**Business**: /api/business/intelligence
**Artifacts**: /api/artifacts
**Ingestion**: /api/ingestion
**Calendar**: /api/calendar
**Proxy**: /api/proxy/[...path] - catch-all for Python backend

Python Backend Services (`backend-saas/core/`)

**Specialized services and workers**:

**Agent Systems**:
agent_governance_service.py - Permission checking, maturity enforcement
agent_world_model.py - Long-term memory with episode tracking
agent_coordination.py - Multi-agent orchestration
agent_promotion_service.py - Graduation and promotion logic

**Brain Systems**:
business_intelligence.py - CRM data processing
communication_intelligence.py - Intent analysis and routing
reasoning_chain.py - Decision tracking with feedback

**Episode & Graduation**:
episode_service.py - Episode tracking with RLHF feedback
graduation_exam.py - Graduation exam execution
graduation_background_worker.py - Periodic eligibility checks

**Worker Processes**:
workers/master_worker.py - Background task processing
workers/ - Various specialized workers

Test Infrastructure

Test Endpoints

**Location**: backend-saas/api/routes/test_auth_routes.py

**Endpoints**:

POST /api/test/auth/signup - Create test user and tenant
POST /api/test/auth/login - Login test user
GET /api/test/health - Health check for test endpoints
POST /api/test/agents - Create test agent
POST /api/test/agents/{id}/execute - Execute test agent skill

**Authentication**:

Requires X-Test-Secret: test-secret-key header
Bypasses normal authentication for E2E testing

**Deployment**:

Available on both atom-saas and atom-saas-api
**E2E tests use**: https://[tenant].atomagentos.com/api/v1

Request Flow Diagrams

Frontend Request Flow

User Browser
    ↓
[tenant].atomagentos.com (Cloud Gateway)
    ↓
Next.js Server (port 3000)
    ↓
    ├─→ Static/SSR Pages → Response
    │
    └─→ API Routes (/api/*)
        ├─→ Next.js API handlers (most logic)
        │   ↓
        │   Database (Neon PostgreSQL)
        │   Cache (Redis)
        │   Storage (S3)
        │
        └─→ Proxy to Python (port 8000)
            ↓
            Python FastAPI
            ↓
            Specialized services (world model, graduation, etc.)

E2E Test Request Flow

Playwright Tests
    ↓
HTTPS Request
    ↓
[tenant].atomagentos.com/api/v1 (Cloud Gateway)
    ↓
Python FastAPI (port 8000)
    ↓
Test Auth Routes
    ↓
Database (Neon PostgreSQL)

Data Flow by Feature

Agent Execution Flow

1. User submits task via /api/chat (Next.js)
2. Next.js API route validates request
3. Forwards to Python backend if needed
4. Python services:
   - Check governance (maturity level)
   - Recall experiences from world model
   - Execute via skill executor
   - Record episode to memory
5. Response flows back through Next.js
6. Database records updated

Graduation System Flow

1. Episode created during execution
2. Background worker checks eligibility (periodic)
3. Readiness calculated from episodes
4. If ready, graduation exam triggered
5. Exam validates 5-stage competency
6. On pass: maturity level increased
7. New permissions granted

Multi-Tenant Data Isolation

1. Request arrives with subdomain (tenant.atom.ai)
2. Middleware extracts tenant_id from subdomain
3. All queries filtered by tenant_id
4. RLS policies enforced in PostgreSQL
5. Response contains only tenant data

Environment Variables

atom-saas (Main App)

PORT=3000
ROLE=web
PYTHON_BACKEND_URL=http://127.0.0.1:8000

atom-saas-api (Python Backend)

PORT=8000
PYTHONUNBUFFERED=1
DATABASE_URL=PostgreSQL connection
REDIS_URL=Redis connection

Deployment Notes

Web Platform

**Dockerfile**: Multi-stage build with Next.js + Python
**Entry point**: docker-entrypoint.sh
**Startup sequence**:

Start Python backend (port 8000)
Wait for backend to be ready (30s timeout)
Start Next.js (port 3000) with exec

API Service

**Dockerfile**: Dockerfile.api (backend-only)
**Health check**: /api/v1/health
**Auto-Scale**: Automated based on traffic patterns
**Start command**: atom-cli nodes start <id>

Common Issues

Issue: API timeouts from web platform

**Cause**: Python backend not responding in same node

**Solution**: Check backend logs, restart node

Issue: API Service returns 404

**Cause**: Service scaled to zero or startup delay

**Solution**: Check status via atom-cli status

Issue: E2E tests failing

**Cause**: API service not responding or outdated deployment

**Solution**:

Check service status: atom-cli status
Restart nodes: atom-cli nodes restart <id>
Deploy latest: atom-cli deploy

Database Schema

Key Tables

**users** - User accounts
**tenants** - Tenant/subdomain configuration
**agent_registry** - AI agent definitions
**agent_executions** - Execution history
**episodes** - Agent execution cycles (new)
**episode_feedback** - RLHF feedback (new)
**tenant_settings** - BYOK configuration

PostgreSQL Features

**Row Level Security (RLS)**: Tenant isolation
**pgvector**: Semantic search for world model
**Indexes**: Performance optimization

Monitoring & Debugging

Cloud Management Commands

# Check status
atom-cli status

# View logs
atom-cli logs
atom-cli logs --service api

# Restart nodes
atom-cli nodes restart <id>

# Access console
atom-cli console

Health Checks

**Web Platform**: GET https://[tenant].atomagentos.com/health
**API Service**: GET https://[tenant].atomagentos.com/api/v1/health
**Test endpoints**: GET https://[tenant].atomagentos.com/api/v1/test/health

Future Improvements

**Separate backend deployment**: Consider running Python backend independently
**Load balancing**: Add load balancer for multiple instances
**Caching**: Implement Redis caching for frequently accessed data
**Monitoring**: Add comprehensive observability with metrics/traces
**Auto-start**: Configure atom-saas-api to auto-start on demand

---

**Last Updated**: 2026-02-09

**Maintained By**: Engineering Team